Closing the Vocabulary Gap for Computing Text Similarity and Information Retrieval
نویسندگان
چکیده
This paper studies the integration of lexical semantic knowledge in two related semantic computing tasks: ad-hoc information retrieval and computing text similarity. For this purpose, we compare the performance of two algorithms: (i) using semantic relatedness, and (ii) using a conventional extended Boolean model [13] with additional query expansion. For the evaluation, we use two different test collections in the German language especially suitable to study the vocabulary gap problem: (i) GIRT [5] for the information retrieval task, and (ii) a collection of descriptions of professions built to evaluate a system for electronic career guidance in the information retrieval and text similarity tasks. We found that integrating lexical semantic knowledge increases the performance for both tasks. On the GIRT corpus, the performance is improved only for short queries. The performance on the collection of professional descriptions is improved, but crucially depends on the accurate preprocessing of the natural language essays employed as topics.
منابع مشابه
Closing the Service Discovery Gap by Collaborative Tagging and Clustering Techniques
Whereas the number of services that are provided online is growing rapidly, current service discovery approaches seem to have problems fulfilling their objectives. Existing approaches are hampered by the complexity of underlying semantic service models and by the fact that they try to impose a technical vocabulary to users. This leads to what we call the service discovery gap. In this paper we ...
متن کاملInformation Retrieval based on Paraphrase
Text Retrieval systems based on ranking use similarity as an approximation to relevance. Most of these systems ignore word meaning. We assume that some measure of paraphrase would be a better similarity measure. We develop a concept of paraphrase based on Meaning-Text Theory and implement an approximation to the ideal using the Longman Dictionary of Contemporary English (LDOCE). The performance...
متن کاملبررسی نقش انواع بافتار همنویسهها در تعیین شباهت بین مدارک
Aim: Automatic information retrieval is based on the assumption that texts contain content or structural elements that can be used in word sense disambiguation and thereby improving the effectiveness of the results retrieved. Homographs are among the words requiring sense disambiguation. Depending on their roles and positions in texts, homograph contexts could be divided to different types, wit...
متن کاملRemedies against the Vocabulary Gap in Information Retrieval
Search engines rely heavily on term-based approaches that represent queries and documents as bags of words. Text---a document or a query---is represented by a bag of its words that ignores grammar and word order, but retains word frequency counts. When presented with a search query, the engine then ranks documents according to their relevance scores by computing, among other things, the matchin...
متن کاملStudying the Effect of Retrieval Direction during Reading on Productive and Receptive Knowledge of Vocabulary
Retrieval tasks provide learners with an opportunity to focus both on meaning and on form. There are four different retrieval directions. The present study aimed to identify the optimal direction of recall type retrievals during reading and to investigate the outcomes of each one. Forty-eight intermediate EFL learners took part in the study. One of the experimental groups was provided with the ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Int. J. Semantic Computing
دوره 2 شماره
صفحات -
تاریخ انتشار 2008